MRI中胎儿结构的体积测量很耗时,并且容易发生错误,因此需要自动分割。由于胎盘模糊边界和胎儿脑皮层复杂的褶皱,胎盘分割和准确的胎儿脑分割进行回旋评估特别具有挑战性。在本文中,我们研究了对问题的轮廓骰子损失的使用,并将其与其他边界损失以及联合骰子和横向内向损失进行比较。通过侵蚀,扩张和XOR操作员有效地计算出每个切片的损失。我们描述了类似于轮廓骰子指标的损失的新公式。骰子损失和轮廓骰子的组合为胎盘分割提供了最佳性能。对于胎儿脑部分割,最佳性能的损失是结合骰子丢失,随后是骰子和轮廓骰子损失的骰子,其性能比其他边界损失更好。
translated by 谷歌翻译
深度学习方法已被证明可以有效地分割医学成像中的结构和病理。但是,它们需要大量注释的数据集,其手动分割是一项繁琐且耗时的任务,尤其是对于大型结构。我们提出了一种新的部分注释方法,该方法使用每次扫描中的一小部分连续注释切片,其注释工作仅等于很少的注释情况。通过仅使用带注释的块进行部分注释的培训,将有关切片的信息包含在感兴趣的结构之外,并修改批处理损失函数以仅考虑带注释的切片。为了促进低数据制度中的培训,我们使用两步优化过程。我们用两个MRI序列Trufi和Fiesta用流行的软骰子损失测试了该方法,并将完整的注释状态与部分注释与类似的注释工作进行了比较。对于TRUFI数据,与完整注释相比,部分注释的使用平均表现稍好一些,骰子得分从0.936增加到0.942,并且骰子的标准偏差(STD)大幅下降22%,平均对称表面距离(ASSD)提高15%。对于嘉年华的序列,部分注释还会在分布数据中分别降低骰子分数和ASSD指标的STD和ASSD指标分别降低27.5%和33%骰子得分从0.84到0.9,从7.46降低到4.01毫米。两步优化过程有助于部分注释分别分配和分布数据。因此,建议使用两步优化器的部分注释方法在低数据制度下改善分割性能。
translated by 谷歌翻译
正常的胎儿脂肪组织(AT)发育对于围产期健康至关重要。在或简单地脂肪以脂质形式存储能量。营养不良可能导致过度或耗尽的肥胖。尽管以前的研究表明,AT和围产期结局的量之间存在相关性,但缺乏定量方法,对AT的产前评估受到限制。使用磁共振成像(MRI),可以从两个点Dixon图像中获得整个胎儿的3D脂肪和纯水图像,以在脂质定量时启用。本文是第一个提出一种基于Dixon MRI的胎儿脂肪分割的深度学习方法的方法。它优化了放射科医生的手动胎儿脂肪描述时间,以生成带注释的培训数据集。它由两个步骤组成:1)基于模型的半自动胎儿脂肪分割,由放射科医生进行了审查和纠正; 2)使用在所得的注释数据集中训练的DL网络的自动胎儿脂肪分割。培训了三个DL网络。与手动分割相比,我们显示出分割时间(3:38小时至<1小时)和观察者变异性(0.738至0.906)的显着改善。用3D残差U-NET,NN-UNET和SWIN-UNETR TRONSERTER网络对24个测试用例进行自动分割,平均骰子得分分别为0.863、0.787和0.856。这些结果比手动观察者的变异性更好,并且与自动成人和小儿脂肪分割相当。一名放射科医生审查并纠正了六个新的独立案例,并使用最佳性能网络进行了细分,导致骰子得分为0.961,校正时间显着减少了15:20分钟。使用这些新颖的分割方法和短暂的MRI获取时间,可以在临床和大型果园研究中量化全身皮下脂质的单个胎儿。
translated by 谷歌翻译
超声检查的胎儿生长评估是基于一些生物特征测量,这些测量是手动进行并相对于预期的妊娠年龄进行的。可靠的生物特征估计取决于标准超声平面中地标的精确检测。手动注释可能是耗时的和依赖操作员的任务,并且可能导致高测量可变性。现有的自动胎儿生物特征法的方法依赖于初始自动胎儿结构分割,然后是几何标记检测。但是,分割注释是耗时的,可能是不准确的,具有里程碑意义的检测需要开发特定于测量的几何方法。本文描述了Biometrynet,这是一个克服这些局限性的胎儿生物特征估计的端到端地标回归框架。它包括一种新型的动态定向测定(DOD)方法,用于在网络训练过程中执行测量特定方向的一致性。 DOD可降低网络训练中的变异性,提高标志性的定位精度,从而产生准确且健壮的生物特征测量。为了验证我们的方法,我们组装了一个来自1,829名受试者的3,398张超声图像的数据集,这些受试者在三个具有七个不同超声设备的临床部位收购。在两个独立数据集上的三个不同生物识别测量值的比较和交叉验证表明,生物元网络是稳健的,并且产生准确的测量结果,其误差低于临床上允许的误差,优于其他现有的自动化生物测定估计方法。代码可从https://github.com/netanellavisdris/fetalbiometry获得。
translated by 谷歌翻译
Extracting complex structures from grid-based data is a common key step in automated medical image analysis. The conventional solution to recovering tree-structured geometries typically involves computing the minimal cost path through intermediate representations derived from segmentation masks. However, this methodology has significant limitations in the context of projective imaging of tree-structured 3D anatomical data such as coronary arteries, since there are often overlapping branches in the 2D projection. In this work, we propose a novel approach to predicting tree connectivity structure which reformulates the task as an optimization problem over individual steps of a recursive process. We design and train a two-stage model which leverages the UNet and Transformer architectures and introduces an image-based prompting technique. Our proposed method achieves compelling results on a pair of synthetic datasets, and outperforms a shortest-path baseline.
translated by 谷歌翻译
Curriculum learning and self-paced learning are the training strategies that gradually feed the samples from easy to more complex. They have captivated increasing attention due to their excellent performance in robotic vision. Most recent works focus on designing curricula based on difficulty levels in input samples or smoothing the feature maps. However, smoothing labels to control the learning utility in a curriculum manner is still unexplored. In this work, we design a paced curriculum by label smoothing (P-CBLS) using paced learning with uniform label smoothing (ULS) for classification tasks and fuse uniform and spatially varying label smoothing (SVLS) for semantic segmentation tasks in a curriculum manner. In ULS and SVLS, a bigger smoothing factor value enforces a heavy smoothing penalty in the true label and limits learning less information. Therefore, we design the curriculum by label smoothing (CBLS). We set a bigger smoothing value at the beginning of training and gradually decreased it to zero to control the model learning utility from lower to higher. We also designed a confidence-aware pacing function and combined it with our CBLS to investigate the benefits of various curricula. The proposed techniques are validated on four robotic surgery datasets of multi-class, multi-label classification, captioning, and segmentation tasks. We also investigate the robustness of our method by corrupting validation data into different severity levels. Our extensive analysis shows that the proposed method improves prediction accuracy and robustness.
translated by 谷歌翻译
Temporal reasoning is the task of predicting temporal relations of event pairs with corresponding contexts. While some temporal reasoning models perform reasonably well on in-domain benchmarks, we have little idea of the systems' generalizability due to existing datasets' limitations. In this work, we introduce a novel task named TODAY that bridges this gap with temporal differential analysis, which as the name suggests, evaluates if systems can correctly understand the effect of incremental changes. Specifically, TODAY makes slight context changes for given event pairs, and systems need to tell how this subtle contextual change will affect temporal relation distributions. To facilitate learning, TODAY also annotates human explanations. We show that existing models, including GPT-3, drop to random guessing on TODAY, suggesting that they heavily rely on spurious information rather than proper reasoning for temporal predictions. On the other hand, we show that TODAY's supervision style and explanation annotations can be used in joint learning and encourage models to use more appropriate signals during training and outperform across several benchmarks. TODAY can also be used to train models to solicit incidental supervision from noisy sources such as GPT-3 and moves farther towards generic temporal reasoning systems.
translated by 谷歌翻译
State-of-the-art 3D semantic segmentation models are trained on the off-the-shelf public benchmarks, but they often face the major challenge when these well-trained models are deployed to a new domain. In this paper, we propose an Active-and-Adaptive Segmentation (ADAS) baseline to enhance the weak cross-domain generalization ability of a well-trained 3D segmentation model, and bridge the point distribution gap between domains. Specifically, before the cross-domain adaptation stage begins, ADAS performs an active sampling operation to select a maximally-informative subset from both source and target domains for effective adaptation, reducing the adaptation difficulty under 3D scenarios. Benefiting from the rise of multi-modal 2D-3D datasets, ADAS utilizes a cross-modal attention-based feature fusion module that can extract a representative pair of image features and point features to achieve a bi-directional image-point feature interaction for better safe adaptation. Experimentally, ADAS is verified to be effective in many cross-domain settings including: 1) Unsupervised Domain Adaptation (UDA), which means that all samples from target domain are unlabeled; 2) Unsupervised Few-shot Domain Adaptation (UFDA) which means that only a few unlabeled samples are available in the unlabeled target domain; 3) Active Domain Adaptation (ADA) which means that the selected target samples by ADAS are manually annotated. Their results demonstrate that ADAS achieves a significant accuracy gain by easily coupling ADAS with self-training methods or off-the-shelf UDA works.
translated by 谷歌翻译
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.
translated by 谷歌翻译
In this paper, we discuss an imitation learning based method for reducing the calibration error for a mixed reality system consisting of a vision sensor and a projector. Unlike a head mounted display, in this setup, augmented information is available to a human subject via the projection of a scene into the real world. Inherently, the camera and projector need to be calibrated as a stereo setup to project accurate information in 3D space. Previous calibration processes require multiple recording and parameter tuning steps to achieve the desired calibration, which is usually time consuming process. In order to avoid such tedious calibration, we train a CNN model to iteratively correct the extrinsic offset given a QR code and a projected pattern. We discuss the overall system setup, data collection for training, and results of the auto-correction model.
translated by 谷歌翻译